On the use of agglomerative and spectral clustering in speaker diarization of meetings
نویسندگان
چکیده
In this paper, we present a clustering algorithm for speaker diarization based on spectral clustering. State-of-the-art diarization systems are based on agglomerative hierarchical clustering using Bayesian Information Criterion and other statistical metrics among clusters which results in a high computational cost and in a time demanding approach. Our proposal avoids the use of such metrics applying Euclidean distances on the eigenvectors computed from the normalized graph Laplacian. A hybrid system is proposed in which HMM/GMM modelling and Viterbi alignment are still applied, but the BIC for merging and stopping criterion are substituted by a spectral clustering algorithm. Once an initial segmentation is obtained and the clustering alignment is computed using the Viterbi algorithm, the remaining clusters are modeled by stacking the means of the Gaussians in a super vector. In such a space single value decomposition of the associated normalized graph Laplacian is computed. Most similar clusters are merged based on the Euclidean distances in resulting eigenspace. Cluster number estimation is based on analyzing eigenstructure of the similarity matrix by selecting a threshold on the eigenvalues gap. In experiments, this approach has obtained a comparable performance to the traditional AHC+BIC approach on the Rich Transcription conference evaluation data. Although it still relies on Gaussian modelling of clusters and Viterbi alignment, the proposed approach leads to a system which runs several times faster than traditional one.
منابع مشابه
Multi-stage Speaker Diarization for Conference and Lecture Meetings
The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseli...
متن کاملSpeaker Diarization in Meetings Domain
The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains such as telephone and broadcast news. Traditional techniques for speaker diarization developed for telephone conversations or broadcast news are based on a single channel, which is notably different f...
متن کاملRobust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System
In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. The presented system is based on the RT05s system, which uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clu...
متن کاملPriors for Speaker Counting and Diarization with AHC
Estimating the number of speakers in an audio segment is a necessary step in the process of speaker diarization, but current diarization algorithms do not explicitly define a prior probability on this estimation. This work proposes a process for including priors in speaker diarization with agglomerative hierarchical clustering (AHC). It is also shown that the exclusion of a prior with AHC is it...
متن کاملClustering initialization based on spatial information for speaker diarization of meetings
This paper proposes an initialization for an agglomerative system applied to speaker diarization in the meeting environment. The initialization is based on a previous clustering of the temporal sequence generated by the estimation of the Time Delay of Arrival (TDOA) among pair of sensors. That initial clustering has the purpose of obtaining initial classes with speaker information from a sole s...
متن کامل